Skip to content

Conversation

@jvdp1
Copy link
Collaborator

@jvdp1 jvdp1 commented Oct 25, 2025

Proposal to support strides in the convolutional layers

@Riccardo231 @milancurcic does this approach make sense? If yes, I will continue its implementation.

@jvdp1 jvdp1 requested review from Riccardo231 and milancurcic and removed request for milancurcic October 25, 2025 18:52
@milancurcic
Copy link
Member

Yes, it's exactly how I would do it, thanks Jeremie.

@Riccardo231
Copy link
Collaborator

I think it is a good implementation. Thanks for your job

@Riccardo231
Copy link
Collaborator

I'd like to offer my support whenever it is needed, feel free to contact me, right now I'm busy to develop improvements on my own but I can cooperate with someone else
@milancurcic @jvdp1

Comment on lines +189 to +190
self % gradient(k,iws:iwe,jws:jwe) = self % gradient(k,iws:iwe,jws:jwe) &
+ gdz(n,iws:iwe,jws:jwe) * self % kernel(n,k,1:iwe-iws+1,1:jwe-jws+1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Riccardo231 Could you check this, please? I think it has a different behaviour now. However, I am not sure what was the goal before the change, because all entries of self % gradient were not updated (that is only the entries between istart:iend and jstart:jend were updated).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I implemented the conv2d variant.. I'll look at this carefully over the weekend. It's possible that the original code was bad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look tomorrow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just to inform you these days I'm really busy with school. Can probably have a look after the 5th. Sorry for the delay

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just to inform you these days I'm really busy with school. Can probably have a look after the 5th. Sorry for the delay

No worries. Whenever you have time. Thank you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, sadly I have been missing some days more than expected. However, now I am ready to commit to the project. Do I still have to take a look or was it already done? Thank you and sorry for the delay

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Riccardo231, no problems. If you have time, it would be nice if you would take a look at this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, just had a look, ran a test with cnn_mnist.f90, didn't modify any parameter, old and new version both converge to 80% accuracy within a 10 epoch range. Your changes look good to me but I didn't take a deep dive, however feel free to tell me whenever it's needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, I appreciate your patience with this.

Based on my comment ! dL/dx = dL/dy * sigma'(z) .inner. w, assuming it is correct based on my understanding of how backward pass of convolutional layers works, this should be an inner product (sum of element-wise products), rather than simply element-wise products added element-wise to the gradient.

It is true that the original implementation would make the edges of the gradient not updated, but I think this is merely a consequence of summing the kernel width to a scalar result.

So, I think the previous code was correct, meaning, we need the full inner product (including the sum to get the scalar result), not just element-wise product and assign.

If I'm correct, it means that Riccardo's conv1d backward pass should be updated to reflect this.

Let me know what you think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. So, should it be something like:

Suggested change
self % gradient(k,iws:iwe,jws:jwe) = self % gradient(k,iws:iwe,jws:jwe) &
+ gdz(n,iws:iwe,jws:jwe) * self % kernel(n,k,1:iwe-iws+1,1:jwe-jws+1)
self % gradient(k,i,j) = self % gradient(k,i,j) &
+ sumgdz(n,iws:iwe,jws:jwe) * self % kernel(n,k,1:iwe-iws+1,1:jwe-jws+1))

If correct, conv1d must be revised too, as well as locally_connected.

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Oct 31, 2025

@milancurcic @Riccardo231 Pending a comment/question, this PR is ready for review and/or to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants